Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Text classification based on pre-training model and label fusion
Hang YU, Yanling ZHOU, Mengxin ZHAI, Han LIU
Journal of Computer Applications    2024, 44 (3): 709-714.   DOI: 10.11772/j.issn.1001-9081.2023030340
Abstract212)   HTML20)    PDF (922KB)(257)       Save

Accurate classification of massive user text comment data has important economic and social benefits. Nowadays, in most text classification methods, text encoding method is used directly before various classifiers, while the prompt information contained in the label text is ignored. To address the above issues, a pre-training model based Text and Label Information Fusion Classification model based on RoBERTa (Robustly optimized BERT pretraining approach) was proposed, namely TLIFC-RoBERTa. Firstly, a RoBERTa pre-training model was used to obtain the word vector. Then, the Siamese network structure was used to train the text and label vectors respectively, and the label information was mapped to the text through interactive attention, so as to integrate the label information into the model. Finally, an adaptive fusion layer was set to closely fuse the text representation with the label representation for classification. Experimental results on Today Headlines and THUCNews datasets show that compared with mainstream deep learning models such as RA-Labelatt (replacing static word vectors in Label-based attention improved model with word vectors trained by RoBERTa-wwm) and LEMC-RoBERTa (RoBERTa combined with Label-Embedding-based Multi-scale Convolution for text classification), the accuracy of TLIFC-RoBERTa is the highest, and it achieves the best classification performance in user comment datasets.

Table and Figures | Reference | Related Articles | Metrics